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Abstract 

In  this  paper  we  discuss  several  possible  uses  of  the 
knowledge  of  how  user’s  logical  designs  are  mapped  to 
physical  FPGA  circuits.  Some  of  these  uses  include  power 
analyses,  useful  feedback  on  physical  design  implementa¬ 
tions,  and  direct,  quick  modifications  of  physical  designs. 
As  an  example  of  how  this  knowledge  can  be  used,  we  de¬ 
scribe,  in  detail,  how  to  determine  the  logical-to-physical 
mapping  ofXilinx  XC4000  circuits  created  with  JHDL  and 
how  this  mapping  and  FPGA  state  sampling,  or  readback, 
enables  us  to  provide  a  hardware  debugging  environment 
with  complete  visibility  of  all  flip-flops  and  LUT  RAMs  in 
executing  hardware. 


1  Introduction 

In  supporting  hardware  debugging  in  the  JHDL  [1,  2] 
design  environment,  we  have  found  that  knowing  how  de¬ 
sign  elements  from  the  user’s  logical  design  were  mapped 
to  their  counterparts  in  the  FPGA  physical  implementation 
is  quite  important.  With  only  a  partial  mapping  from  the 
logical  to  the  physical,  we  would  not  be  able  to  provide 
users  of  JHDL  with  a  complete  view  of  what  their  circuit 
is  doing  during  hardware  execution  via  FPGAs’  readback 
mechanism  [3,  4],  Beyond  the  applications  of  debugging, 
this  same  knowledge  can  also  be  used  to  provide  the  de¬ 
signer  with  a  more  detailed  and  understandable  view  of  a 
circuit  once  physically  implemented  and  contribute  to  im¬ 
proved  and  more  understandable  analyses  of  design  char¬ 
acteristics  such  as  dynamic  power  consumption  and  critical 
path  identification. 
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As  a  demonstration  of  how  useful  this  mapping  infor¬ 
mation  is,  this  paper  will  describe  the  process  of  deter¬ 
mining  the  logical-to-physical  mappings  for  XC4000  cir¬ 
cuits  implemented  with  the  Xilinx  Alliance  and  Foundation 
software  tools  and  how  this  mapping  is  applied  to  create  a 
board-independent,  device-specific  mechanism  for  debug¬ 
ging  hardware  in  the  JHDL  environment.  As  background, 
we  will  briefly  review  the  role  of  readback  in  debugging 
FPGA  circuit  designs  and  why  determining  the  logical- 
to-physical  mapping  is  important  for  debugging  hardware 
with  readback.  Next,  we  will  discuss  the  portion  of  the 
JHDL  design  flow  which  applies  to  hardware  debugging, 
namely,  circuit  netlist  creation,  the  processing  of  Xilinx  re¬ 
port  files,  and  the  creation  of  a  JHDL  readback  symbol  ta¬ 
ble.  Following  this,  we  will  discuss  how  board  models  in¬ 
teract  with  and  use  the  JHDL  readback  API  as  well  as  how 
FPGA  vendor  software  can  improve  its  support  for  third- 
party  tools  which  need  to  determine  logical-to-physical  de¬ 
sign  mappings.  Lastly,  we  will  describe  other  possible  uses 
of  this  mapping  information  and  summarize  several  of  the 
lessons  learned  from  this  work. 


2  Background 

After  a  designer  describes  a  circuit  using  a  hardware 
description  language  (HDL)  or  a  schematic,  the  descrip¬ 
tion  is  usually  simulated  to  verify,  to  some  degree,  that  the 
designed  circuit  operates  as  expected.  Design  verification 
through  simulation,  though,  can  be  very  time  consuming, 
especially,  when  design  errors  may  not  be  discovered  un¬ 
til  millions  or  hundreds  of  millions  of  clock  cycles  have 
been  executed;  the  process  can  sometimes  take  hours,  days, 
or  longer.  For  FPGA-based  configurable  computing  ma¬ 
chines  (CCMs),  hardware  debugging  environments  can  be 
used  to  find  and  debug  the  problems  much  more  quickly- 
millions  or  hundreds  of  millions  of  cycles  may  only  require 
seconds  or  minutes  of  execution.  Besides  the  speed  bene- 
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fits,  hardware  debugging  environments  can  also  provide  a 
greater  degree  of  realism  during  circuit  validation. 

One  functionality  found  in  many  modern  FPGAs  which 
greatly  aids  the  hardware  debugging  of  CCM  applications 
is  the  ability  to  sample,  or  readback,  the  state  of  the  FPGA, 
including  the  values  of  flip-flops  and  RAM  contents.  For 
example,  with  the  Xilinx  XC4000  and  Virtex  families  of 
FPGAs,  the  values  of  all  user  flip-flops,  RAMs  (LUT  and 
block),  and  the  combinational  outputs  of  IOBs  and  CLBs 
are  available  for  inspection  through  readback  [3,  4]. 

Because  of  the  excellent  visibility  that  readback  pro¬ 
vides  into  the  operation  of  circuits,  many  CCM  platforms 
have  either  employed  the  FPGA  readback  capability  in  a 
debugging  environment  or  provided  at  least  an  application 
programmer’s  interface  (API)  for  accessing  the  readback 
state  of  the  FPGAs.  For  instance,  the  Splash  2  run-time  en¬ 
vironment  [5]  provided  both  an  API  as  well  as  a  run-time 
debugging  tool  named  t2  which  could  be  used  to  execute 
and  debug  a  design.  The  tool  provided  a  facility  to  watch 
the  values  of  signals  as  designs’  clocks  were  stepped — a 
functionality  utilizing  the  Xilinx  XC4000’s  readback  capa¬ 
bility.  As  another  example,  the  run-time  libraries  provided 
with  Annapolis  Micro  System’s  Wildforce  boards  [6]  have 
several  readback-related  API  calls  for  creating  and  manip¬ 
ulating  readback  symbol  tables  as  well  as  for  retrieving  the 
readback  data  itself. 

In  our  experiences  using  Splash  2  and  Wildforce,  the 
readback  symbol  table  mechanisms  have  been  very  useful 
but  also  limited.  At  times,  signal  names  would  be  modi¬ 
fied  as  the  Xilinx  tools  would  optimize  the  designs,  mean¬ 
ing  that  we  would  have  to  go  searching  through  the  known 
readback  symbols  to  find  the  new  names  of  the  signals  we 
needed.  In  other  words,  the  users  themselves  had  to  de¬ 
termine  the  logic  al-to-physical  mapping  for  some  signal 
names  due  to  the  run-time  software’s  incomplete  knowl¬ 
edge  of  designs’  logical-to-physical  mappings. 

This  limitation  is  related  to  the  amount  of  extra  work 
required  to  resolve  the  logical-to-physical  mappings  in  the 
VHDL  or  Verilog  design  environments  for  CCMs.  To  pro¬ 
vide  a  comprehensive  hardware  debugging  system  for  a 
CCM  run-time  system  using  traditional  HDL  tools,  three 
mappings  must  be  determined.  First,  the  synthesis  map¬ 
ping  of  the  HDL  design  to  an  FPGA  vendor’s  library  must 
be  understood,  which  probably  involves  parsing  the  netlists 
resulting  from  synthesis  and  comparing  the  netlist  with  the 
HDL  design.  This  task  is  not  easy  when  the  synthesis  tool 
performs  many  optimizations.  Second,  a  correlation  must 
be  made  between  the  netlist  of  the  synthesized  design  and 
the  physical  form  created  by  the  FPGA  vendor’s  imple¬ 
mentation  tools.  The  final  mapping  from  the  original  HDL 
design  to  the  physical  implementation  can  be  derived  from 
the  first  two  mappings.  Even  without  the  synthesis  step. 


the  relationship  between  a  structural  user  design  and  the 
physical  FPGA  implementation  is  not  easy  in  the  presence 
of  design  optimizations  made  by  FPGA  vendors’  software. 

Instead  of  determining  these  mappings,  CCM  run-time 
software  generally  uses  net  names  as  a  means  of  relating 
the  results  of  an  FPGA  readback  to  the  current  state  of  a 
user’s  design.  As  an  example,  early  IHDL  readback  im¬ 
plementations  used  this  approach.  IHDL  would  extract  net 
name  information  from  the  map  report  file  ( .  mrp  file)  and 
the  logical  allocation  file  ( .  11  file)  created  by  the  Xilinx 
FPGA  implementation  tools.  The  .  mrp  file  records  how 
the  circuit  is  mapped  to  FPGA  resources  and  how  it  was 
optimized  during  technology  mapping.  The  .  11  file,  on 
the  other  hand,  reports  the  readback  bitstream  locations  of 
the  outputs  sampled  from  flip-flops,  CLBs,  and  IOBs  as 
well  as  the  locations  of  block  and  LUT  RAM  contents  in 
the  readback  bitstream.  Entries  in  the  .  1 1  file  for  the  sam¬ 
pled  outputs  of  CLBs,  IOBs,  and  flip-flops  are  annotated 
with  the  name  of  the  nets  attached  to  those  outputs  in  the 
physical  circuit.  Thus,  assuming  a  certain  net  is  connected 
to  the  output  of  a  flip-flop  both  in  the  logical  design  and  its 
physical  implementation,  a  correlation  between  the  logical 
and  physical  designs  can  be  made.  Unfortunately,  using 
these  two  files  alone  in  making  the  association  between  the 
logical  and  physical  can  be  problematic  for  several  reasons: 

•  Due  to  circuit  optimizations  and  hierarchical  net  name 
flattening,  it  is  not  always  simple  or  even  possible  to 
relate  net  names  from  the  original,  logical  design  to 
those  found  in  the  final,  physical  design  based  only 
on  the  information  from  these  files.  The  .mrp  file 
reports  design  changes,  including  net  name  changes, 
resulting  from  optimizations  performed  during  tech¬ 
nology  mapping,  but  name  flattening  for  nets  crossing 
hierarchical  boundaries  is  not  reported.  This  makes 
the  process  of  tracking  net  name  changes  difficult,  if 
not  impossible,  without  parsing  netlists  or  otherwise 
knowing  the  exact  structure  of  the  original  circuit. 

•  RAM  entries  in  the  .  1 1  file  are  associated  with  phys¬ 
ical  locations  on  the  FPGA  alone  and  not  with  any  net 
names.  Though  the  outputs  of  the  CLB  are  associated 
with  net  names,  there  are  several  cases  where  it  is  not 
possible  to  determine  which  output  of  the  CLB  cor¬ 
responds  to  which  LUT  RAM  based  on  the  informa¬ 
tion  of  these  two  files.  Thus,  the  net  name  association 
method  fails  occasionally  when  associating  physical 
LUT  RAMs  to  their  logical  counterparts. 

•  As  with  other  uses  of  LUTs,  LUT  RAMs  can  have 
their  address  pins  permuted  by  the  place  and  route 
tools.  This  makes  relating  logical  and  physical  LUT 
RAM  contents  a  problem  since  the  .  1 1  and  .  mrp 
files  have  no  information  regarding  how  these  inputs 


to  the  RAMs  were  permuted.  As  far  as  we  know,  this 
information  is  only  available  in  the  placed  and  routed 
Native  Circuit  Description,  or .  ncd,  file  and  a  few  de¬ 
rived  files. 

Considering  these  short-comings,  it  is  no  surprise  that  most 
CCM  run-time  software  packages  do  not  support  the  read- 
back  of  LUT  RAMs.  Clearly,  this  methodology  of  relating 
logical  design  elements  to  physical  circuit  elements  can¬ 
not  be  100%  accurate  or  complete  and  another  approach  is 
needed. 

Another  difficulty  of  many  CCM  hardware  debugging 
environments  is  that  the  design,  simulation,  and  execution 
environments  are  different  and,  potentially,  unrelated.  For 
instance,  in  some  CCM  synthesis  environments,  the  simu¬ 
lator  and  synthesis  tools  are  separate  and  provided  by  dif¬ 
ferent  CAD  tool  vendors.  As  a  result  of  this  and  the  limited 
logic  al-to-physical  mapping  used  by  many  CCM  hardware 
debugging  environments,  some  correspondences  between 
the  HDL  design  and  its  physical  implementation  must  be 
made  manually  by  an  experienced  designer. 


3  JHDL  and  Hardware  Execution 

As  discussed  in  [2],  the  JHDL  design,  simulation,  and 
execution  environment  provides  a  unified  view  of  a  de¬ 
sign’s  simulation  and  hardware  execution — they  look  the 
same  in  the  various  design  views.  During  simulation,  the 
simulator  itself  is  responsible  for  keeping  track  of  the  val¬ 
ues  and  state  of  the  circuit  elements  in  a  design.  To  make 
this  unified  view  possible  in  hardware  mode,  the  sampled 
state  of  the  FPGAs  from  readback  is  used  to  recreate  the 
entire  state  of  the  JHDL  design  within  the  simulator. 

In  striving  for  this  unified  view  of  designs,  we  have  de¬ 
veloped  a  reasonable  solution  to  relating  both  LUT  RAM 
and  flip-flop  state  in  the  readback  bitstream  to  the  user’s 
design.  This  was  simplified  by  several  factors  relating  to 
both  JHDL  and  the  Xilinx  FPGA  implementation  software. 
JHDL  designs  are  structural  in  nature  rather  than  synthe¬ 
sized;  this  means  that  at  netlist  time  a  complete  description 
of  the  circuit  is  known.  As  for  the  Xilinx  software,  the  in¬ 
stance  names  placed  in  EDIF  netlists  are  preserved  all  the 
way  through  the  design  flow,  even  down  to  the  placed-and- 
routed  design  found  in  the  .  ncd  file.  These  two  features 
allow  a  simple  one-to-one  correspondence  to  be  made  be¬ 
tween  circuit  elements  at  the  logical  and  physical  levels. 

The  next  few  sections  of  the  paper  will  describe  the  pro¬ 
cess  of  how  we  take  advantage  of  these  features  to  correlate 
all  the  state  provided  by  readback  to  their  corresponding 
elements  in  a  XC4000-based  design.  Despite  the  fact  that 
our  discussion  involves  JHDL  most,  if  not  all,  of  the  tech¬ 
niques  we  will  describe  can  be  used  in  other  FPGA  design 


methodologies. 


4  Creating  Designs  and  Netlists 

As  discussed  in  [1,  2],  a  JHDL  user  design  is  de¬ 
scribed  in  Java  using  libraries  from  the  JHDL  envi¬ 
ronment.  The  circuit  can  be  described  using  hier¬ 
archy  and  generally  involves  the  instantiation  of  de¬ 
sign  primitives  which  can  be  either  from  a  technology- 
specific  or  technology-independent  library.  Technology- 
specific  library  elements  directly  implement  the  library 
elements  found  in  an  FPGA  vendor’s  design  libraries, 
while  the  technology-independent  library  elements  from 
the  byucc  .  jhdl .  Logic  package  instance  technology- 
specific  library  elements  via  the  JHDL  Techmapper 
API,  which  maps  the  generic  library  elements  of  the 
byucc  .  jhdl .  Logic  library  to  the  corresponding  li¬ 
brary  elements  of  a  specific  FPGA  technology.  In  addition 
to  library  primitives,  a  designer  can  use  parameterizable 
module  generators  which  create  optimized  circuits  based 
on  library  primitives;  commonly  used  module  generators 
include  adders,  multipliers,  and  counters. 

As  a  result  of  this  structural  design  methodology  em¬ 
ployed  in  JHDL,  we  know  a  priori  how  the  design  will 
map  to  device  primitives.  With  this  knowledge  and  the  fact 
that  instance  names  are  preserved  by  the  Xilinx  software, 
net  names  are  no  longer  needed  as  the  primary  means  of 
relating  readback  information  to  the  user’s  design.  Instead, 
the  instance  names  of  flip-flops  and  RAMs  themselves  can 
be  used  to  make  the  correlation  between  readback  informa¬ 
tion  and  the  design. 

When  the  design  is  ready  to  be  implemented  using  the 
Xilinx  FPGA  tools,  the  JHDL  board  models  create  both  an 
EDIF  netlist  of  the  circuit  and  a  second  netlist  consisting 
only  of  circuit  elements  whose  state  can  be  sampled.  This 
second  netlist,  called  an  .  rbsym  file,  records  the  instance 
name  and  the  library  element  name  for  each  circuit  element 
as  well  as  the  names  of  its  input  and  output  ports  and  the 
nets  attached  to  these  ports.  As  far  as  port  and  net  names 
are  concerned,  only  the  input  ports  and  net  names  for  LUT 
RAMs  are  actually  needed,  as  we  will  demonstrate  later. 

Figure  1  provides  sample  .rbsym  entries  for  an  input 
IOB  flip-flop  (ifdx)  and  a  synchronous  16x1  LUT  RAM. 


5  Creating  Symbol  Tables 

The  most  difficult  task  in  making  hardware  execution 
look  like  simulation  to  the  user  is  the  creation  of  a  sym¬ 
bol  table  which  relates  a  JHDL  circuit  element  to  its 
state  information  found  in  the  readback  bitstream.  The 


/PEl/IOB_Left/if d-O/if dx-0  byucc . jhdl . Xilinx . XC4000 . if dx  out  q  /PEl/IOB_Left/ifd-0/q 
/PEl/LogicCore/raml6xls-0  byucc . jhdl . Xilinx . rami 6xls  in  a<0>  /PEl/LogicCore/CountlBuf ln<0> 
in  a<l>  /PEl/LogicCore/CountlBuf In<l>  in  a<2>  /PEl/LogicCore/Count lBuf In<2> 
in  a<3>  /PEl/LogicCore/CountlBuf In<3>  out  o  /PEl/LogicCore/RAMOut 

Figure  1:  Sample  entries  from  an  .  rbsym  file 


Figure  2:  Process  of  Physical-To-Logical  Mapping 


XC4KTo JHDLSyms  class  from  the  JHDL  readback  API 
performs  this  task  using  the  information  from  the  .  rb¬ 
sym  file  and  three  different  types  of  Xilinx  files:  .11 
files,  .mrp  files,  and  .xdl  files.  The  .xdl  file  used 
in  our  methodology  is  a  textual  description  of  the  final, 
physical  FPGA  design  and  is  equivalent  to  the  .ncd  file 
of  the  placed  and  routed  design;  the  file  itself  is  written 
in  the  Xilinx  Design  Language  (XDL).  Below  we  will  de¬ 
scribe  the  important  information  provided  by  each  of  these 
files  and  their  roles  in  constructing  the  final  readback  sym¬ 
bol  table  for  hardware  execution,  or  .  rbentry  file.  Fol¬ 
lowing  these  descriptions,  we  describe  the  actual  process 
of  determining  the  mapping  between  logical  circuit  ele¬ 
ments  and  their  physical  implementations  when  generating 
an  .rbentry  file.  For  convenience.  Table  1  provides  a 
summary  of  the  files  used  in  the  process  and  Figure  2  pro¬ 
vides  a  graphical  representation  of  the  process. 

5.1  Determining  Readback  Bitstream  Locations 

One  type  of  information  needed  to  create  the  .rben¬ 
try  file  is  a  mapping  between  a  LUT  or  flip-flop  at  a 
given  physical  location  on  the  FPGA  and  the  offset  for  its 
state  in  the  readback  bitstream.  As  discussed  earlier,  the 
.  1 1  file  provides  exactly  this  information.  Additionally, 
the  net  names  that  are  associated  with  the  outputs  of  flip- 
flops,  CLBs,  and  IOBs  in  the  file  can  now  be  ignored  since 
we  can  develop  a  one-to-one  mapping  between  JHDL  cir¬ 
cuit  elements  and  physical  FPGA  entities  based  on  instance 
names. 

Figure  3  provides  some  sample  entries  from  an 
XC4062XL  design’s  .  1 1  file.  The  entries  are  for  the  out¬ 


put  flip-flop  of  Pad  P8 1  ’s  IOB,  an  FFX  flip-flop  in  the  CLB 
located  at  R47C12,  and  the  F  LUT  RAM  bit  with  address 
15  from  the  CLB  located  at  R48C12.  The  first  number 
in  each  entry  is  the  absolute  offset  of  the  state  bit  in  the 
readback  bitstream  while  the  following  two  numbers  cor¬ 
respond  to  the  bit’s  frame  number  and  frame  offset.  The 
absolute  offset  is  related  to  the  frame  number  and  frame 
offset  in  the  following  manner  for  the  XC4000  family  of 
FPGAs: 

of  f  SCtabs  — 

( framesize  —  start-bits  —  stop-bits) 

■frame-number  —  of  f  set  frame  (1) 

where  of  fsetabs  is  the  absolute  offset,  framesize  is  the 
size  of  the  configuration  frame  in  bits  for  the  XC4000  fam¬ 
ily  FPGA  being  used,  start-bits  and  stop-bits  correspond 
to  the  number  of  start  bits  and  stop  bits  per  configuration 
frame,  and  frame-number  and  of  fsetframe  are,  respec¬ 
tively,  the  frame  number  and  frame  offset  provided  by  the 
.  11  file.  For  example,  in  the  first  .  11  entry  provided  in 
Figure  3,  the  framesize  for  a  XC4062XL  is  613  bits  [7] 
while  the  number  of  frame  start  and  stop  bits  are  1  and  4, 
respectively.  Thus,  the  absolute  offset  is  determined  as: 
(613  -  1  -  4)  •  1624  -  605  =  986787. 

As  Eq.  1  demonstrates,  the  absolute  offset  does  not  ac¬ 
count  for  the  start  and  stop  bits  for  frames.  Further,  through 
trial  and  error,  we  determined  that  the  offset  also  ignores 
any  header  information  before  the  first  frame.  So,  when  a 
readback  bitstream  is  obtained,  we  must  account  for  non¬ 
data  bits  in  the  bitstream  when  given  an  absolute  offset 
from  the  .  11  file.  The  following  equation  provides  this 


Name 

Creator 

Description 

. rbsym 
.11 

.  mrp 

.  ncd 

.  xdl 

. rbentry 

JHDL 

Xilinx,  bit  gen  -1 

Xilinx,  map 

Xilinx,  par 

Xilinx,  xdl 

JHDL,  XC4KTo  JHDLSyms 

Netlist  of  logical  circuit  elements  for  readback 

Lists  the  locations  of  sampled  FPGA  state  in  the  readback  bitstream 
Reports  design’s  mapping  to  FPGA  resources  and  optimizations 

Binary,  closed  description  of  a  design’s  FPGA  physical  implementation 
Textual,  open  description  of  a  design’s  FPGA  physical  implementation 
Relates  logical  design  elements  to  locations  in  bitstream 

Table  1:  Summary  of  Files  Used  for  Creating  Readback  Symbol  Tables 


Bit 

986787 

1624 

605 

Block=P81 

Latch=OQ  Net=PE_ 

_Right_Out<8> 

Bit 

1062819 

1749 

573 

Block=CLB_ 

_R47C12  Latch=XQ 

Net=LogicCore/ Count lBuf ln<0> 

Bit 

1071314 

1763 

590 

Block=CLB_ 

_R48C12  Ram=F : 1 5 

Figure  3:  Sample  Entries  from  an  .  11  File 


mapping: 

of  f  SCtactual  —  of  f  S€tabs 

Offsetabs  \ 

fram.esize  —  start-bits  —  stop-bits 
■  ( start-bits  +  stop-bits) 

+start-bits  +  header-bits  (2) 

where  of  fsetactuai  is  the  bit  offset  used  with  a  readback 
bitstream  accounting  for  non-data  bits  and  header -bits  are 
the  number  of  bits  before  the  start  bit  of  the  first  bitstream 
frame — all  other  variables  are  the  same  as  before. 

Several  additional  tidbits  of  information  relating  to 
readback  bitstream  offsets  are  required  to  handle  the  map¬ 
ping  of  logical  to  physical  LUT  RAM  address  bits.  The 
RAM  bit  addresses  provided  in  the  .  1 1  file  are  the  phys¬ 
ical  addresses  of  the  LUTs  assuming  that  FI  and  G1  are 
the  MSBs  for  addresses  of  F  and  G  LUTs,  respectively, 
and  that  the  LSBs  of  the  addresses  are,  of  course,  F4  and 
G4.  For  32x1  LUT  RAMs,  the  F  LUT  contains  the  first 
16  RAM  bits  (addresses  0-15)  and  the  G  LUT  contains  the 
second  16  bits  (addresses  16-31);  32x1  LUT  RAMs  are  de¬ 
noted  using  “RAM=M:<addr>”  instead  of  using  the  LUT 
name  as  in  the  example  above. 

When  XC4KTo JHDLSyms  parses  the  .11  file,  it 
records  useful  readback  bitstream  entries  in  one  of  two 
hash  tables  based  on  what  they  represent — RAMs  are 
recorded  in  what  we  will  call  the  LLRAMHash  while  all 
flip-flops  are  placed  in  the  LLBlockHash.  The  class  en¬ 
ters  the  readback  bitstream  locations  for  individual  flip- 
flops  into  the  LLBlockHash,  using  the  physical  locations 
of  flip-flop  outputs  as  hash  keys.  Similarly,  it  places  the  16 
or  32  bitstream  locations  for  a  LUT  RAM  in  a  single  LL¬ 
RAMHash  entry,  using  the  RAM’s  physical  location  and 


RAM  depth  as  a  hash  key.  These  hash  tables  are  used  later 
to  relate  physical  circuit  elements  to  their  location  in  the 
readback  bitstream. 

5.2  Handling  Design  Optimizations 

Since  the  instance  names  of  a  design’s  circuit  elements 
can  be  used  to  perform  the  mapping  between  its  logical  and 
physical  forms,  the  .  mrp  file  is  not  needed  to  perform  the 
primary  logical-to-physical  mapping  between  these  forms 
as  with  the  mapping  method  using  net  names  alone.  But, 
since  we  still  need  to  be  able  to  take  a  logical  LUT  RAM 
address  from  a  user’s  design  and  map  it  to  the  physical  ad¬ 
dress  in  the  final,  physical  circuit,  we  are  still  interested 
in  the  net  names  connected  to  the  address  pins  of  LUT 
RAMs  and  how  their  names  are  modified  due  to  optimiza¬ 
tion.  Thus,  the  .  mrp  file  is  still  important. 

The  “Merged  Signals(s):”  subsection  of  the  .  mrp  file’s 
“Removed  Logic”  section  is  the  only  subsection  needed  for 
keeping  track  of  how  net  names  are  modified  due  to  circuit 
optimizations.  A  typical  entry  in  this  subsection  of  the  file 
is: 

The  signal  "PE_Right_Out<0>"  was  merged  into 

signal  "LogicCore/Count lBuf ln<0>" . 

The  first  signal  in  the  entry  is  the  original  name  of  the 
signal  while  the  second  is  its  new  name.  To  make  net-name 
resolution  a  bit  more  complicated,  the  new  signal  may  it¬ 
self  be  merged  into  other  nets,  causing  the  signal  to  receive 
yet  another  name.  Lor  instance,  net  A  may  be  merged  into 
net  B,  while  net  B  is  merged  into  net  C;  thus,  net  A  from 
the  original  design  has  been  merged  into  both  B  and  C,  but 
only  net  C  exists  in  the  final  physical  design.  So,  several 


Find  the  newName  in  the  MergedSignalsHash; 

Find  the  newName  in  the  MergedToHash; 

Find  the  oldName  in  the  MergedSignalsHash; 

if (newName  has  not  been  merged  into  other  nets,  i.e.  it  is  not  in  the  MergedToHash)  { 
if (entry  for  newName  doesn't  exist  in  MergedSignalsHash)  { 
add  entry  for  newName  in  MergedSignalsHash; 

add  oldName  to  the  oldNames  list  in  newName' s  MergedSignals  object  in  the 
MergedSignalsHash; 

} 

else  { 

add  oldName  to  the  oldNames  list  in  newName' s  MergedSignals  object  in  the 
MergedSignalsHash; 

} 

if (other  signals  have  been  merged  into  the  oldName,  i.e.,  oldName  has  an  entry  in 
the  MergedSignalsHash)  { 

add  the  oldNames  list  from  oldName' s  MergedSignal  object  to  the  oldNames  list  in 
newName' s  MergedSignal  object  in  the  MergedSignalHash; 
delete  oldName' s  MergedSignalHash  entry; 

} 

add  a  MergedTo  entry  into  the  MergedToHash  for  the  oldName; 

} 

else  {  /*  aliases  exist  for  the  new  name  */ 

set  mergedlntoNet  to  the  name  of  the  net  into  which  the  newName  was  merged  using  the 
MergedToHash  entry  for  newName; 
while (mergedlntoNet  also  has  an  entry  in  the  MergedToHash)  { 

set  mergedlntoNet  to  the  name  of  the  net  into  which  the  last  mergedlntoNet  was  merged; 

} 

add  the  old  name  to  the  oldNames  list  in  the  mergedlntoNet ' s  MergedSignal  object  in 
the  MergedSignalsHash; 

if (other  signals  have  been  merged  into  the  oldName,  i.e.,  oldName  has  an  entry  in 
the  MergedSignalsHash)  { 

add  the  oldNames  list  from  oldName' s  MergedSignal  object  to  the  oldNames  list  in 
mergedlntoNet ' s  MergedSignal  object  in  the  MergedSignalHash; 
delete  oldName' s  MergedSignalHash  entry; 

} 

add  a  MergeTo  entry  into  the  MergedToHash  for  oldName  referring  to  mergedlntoNet  as 
the  new  name; 


Figure  4:  Pseudo-code  for  Resolving  Net  Name  Aliases 


levels  of  merging  must  be  handled  to  determine  what  some 
nets’  final  names  are. 

The  XC4KTo JHDLSyms  class  performs  a  single-pass 
algorithm  to  determine  the  final  name  of  nets  which  have 
been  merged  one  or  more  times.  The  algorithm  uses  two 
hash  tables  to  quickly  perform  the  name  resolution.  The 
MergedSignalsHash  has  entries  keyed  on  the  new 
name  for  the  nets.  Each  entry  is  a  MergedSignals  ob¬ 
ject  which  associates  a  new  name  to  a  list  of  old  (orig¬ 
inal)  net  names  since  several  nets  can  be  merged  into  a 
single  net.  The  second  hash  table,  MergedToHash,  has 
entries  keyed  on  the  old  names  of  nets.  Each  entry  is  a 
MergedTo  object  which  records  the  new  net  name  for 
each  old  net  name.  If  a  signal  has  been  merged  into  an¬ 
other  signal,  it  has  a  corresponding  entry  in  the  Merged¬ 
ToHash.  Pseudo-code  for  the  name  resolution  algorithm 
is  provided  in  Figure  4. 

When  all  of  the  merged  signal  entries  from  the  .  mrp 


file  have  been  added  into  the  MergedSignalsHash, 
multiple  levels  of  net  name  aliases  have  been  resolved,  if 
needed,  and  this  hash  table  can  be  used  to  quickly  find 
the  names  of  the  nets  into  which  other  signals  have  been 
merged.  Specifically,  once  the  MergedSignals  object 
for  a  net  is  found  in  the  MergedSignalsHash,  the 
oldNames  list  of  that  object  can  be  used  to  discover  what 
nets  have  been  merged  into  the  net  represented  by  the  ob¬ 
ject. 

5.3  Associating  Physical  and  JHDL  Elements 

The  last  set  of  information  needed  to  create  a  readback 
symbol  table  is  the  final  placement,  logical  names,  and 
interconnection  of  the  readback  entities  (RAMs  and  flip- 
flops)  in  the  physical  design  implemented  by  the  Xilinx 
FPGA  tools.  This  information  is  contained  in  the  placed- 
and-routed  .ncd  file  for  the  design.  The  format  of  the 


.  ncd  file  is  not  publicly  known,  but  Xilinx  provides  a  tool 
named  xdl  to  convert  the  .  ncd  file  into  a  published,  tex¬ 
tual  file  format  called  the  Xilinx  Design  Language  (XDL) 
which  retains  all  of  the  information  of  the  .  ncd  file.  In 
other  words,  the  file  describes  the  configuration  of  every¬ 
thing  on  an  FPGA  in  a  textual  format.  To  reduce  the  size 
of  XDL  files,  all  the  information  about  programmable  in¬ 
terconnection  points  (PIPs)  can  be  turned  off  since,  in  our 
case,  the  actual  routes  taken  by  nets  is  not  needed  to  create 
our  readback  symbol  table. 

As  we  mentioned  earlier,  the  Xilinx  FPGA  tools  pre¬ 
serve  the  design’s  instance  names  for  each  RAM  and  flip- 
flop  used  in  the  design,  providing  an  easy,  one-to-one 
mapping  between  the  physical  and  logical  designs.  Since 
these  names  are  preserved,  XC4KTo JHDLSyms  records 
the  physical  location  and  both  the  physical  and  logical 
names  of  these  state  elements  as  it  parses  the  .  xdl  repre¬ 
sentation  of  the  physical  design.  In  addition,  the  program 
must  also  record  the  configuration  of  the  HI  multiplexers 
for  CLBs,  the  net  names  for  LUT  RAM  inputs,  and  the 
configuration  of  the  I1MUX  and  I2MUX  multiplexers  of 
IOBs.  The  configuration  for  each  CLB’s  HI  multiplexer  is 
used  to  determine  the  net  name  for  the  MSB  of  a  32x1  LUT 
RAM’s  address.  The  rest  of  the  input  net  names  for  LUT 
RAMs  are  identified  by  their  connections  to  the  F1-F4  pins 
on  F  LUTs  and  the  G1-G4  pins  on  G  LUTs;  the  net  names 
for  RAMs  are  recorded  so  the  permutation  of  the  RAM  ad¬ 
dress  bits  can  be  determined  when  trying  to  map  the  values 
of  logical  design  RAM  bits  to  their  corresponding  locations 
in  the  physical  design’s  readback  bitstream.  The  configura¬ 
tion  of  the  I1MUX  and  I2MUX  IOB  multiplexers  are  used 
to  determine  which  IOB  output  is  used  for  the  output  of  the 
IOB’s  input  flip-flop  (INFF).  This  is  important  since  read- 
back  provides  the  values  of  the  II  and  12  IOB  outputs  and 
not  the  value  of  the  INFF  itself. 

To  simplify  the  parsing  process,  XC4KTo  JHDLSyms 
does  several  things.  First,  it  ignores  the  information  pro¬ 
vided  in  XDL  module  definitions  since  the  instance  state¬ 
ments  in  the  physical  design’s  description  completely  de¬ 
scribe  the  circuit.  Second,  since  there  is  no  guarantee 
based  on  the  published  XDL  grammar  that  net  statements 
will  always  follow  block  instances,  the  program  parses  the 
file  twice,  recording  the  configuration  of  CLBs  and  IOBs 
the  first  time  and  then  recording  the  names  of  the  input 
nets  to  LUT  RAMs  the  second  time.  This  allows  us  to 
only  record  the  nets  that  we  are  interested  in  and  to  asso¬ 
ciate  them  immediately  with  the  CLB  containing  the  LUT 
RAMs.  When  XC4KTo  JHDLSyms  has  fully  processed  the 
.  xdl  file,  it  has  entered  all  of  the  FPGA  blocks  (IOBs 
and  CLBs)  which  have  either  flip-flops  or  LUT  RAMs 
into  an  array,  which  we  will  call  the  XDLBlockArray; 
XC4KTo JHDLSyms  will  use  this  array  during  the  final 


stages  of  the  symbol  table  creation  process. 

5.4  Putting  it  all  together 

With  the  data  from  the  .  rbsym  file  created  by  the 
IHDL  environment  at  netlist  time  and  the  data  from  the 
three  Xilinx  files  (.11,  . mrp,  .xdl),  we  now  have 
enough  information  to  create  a  comprehensive  readback 
symbol  table.  The  construction  of  the  readback  symbol 
table  takes  three  additional  steps:  associating  readback  bit- 
stream  locations  to  each  entry  in  the  XDLBlockArray, 
associating  the  information  from  the  XDLBlockArray 
with  the  logical  JHDL  design  elements  from  the  .  rbsym 
file,  and  permuting  the  readback  bitstream  locations  for 
LUT  RAM  bits.  After  these  tasks  have  been  performed, 
the  symbol  table  is  written  to  a  .  rbentry  file  for  use 
with  JHDL  hardware  execution. 

To  perform  the  first  association,  XC4KTo JHDLSyms 
traverses  each  entry  of  the  XDLBlockArray.  Dur¬ 
ing  this  traversal,  each  flip-flop  and  LUT  RAM  in  each 
XDLBlockArray  entry  is  then  found  in  either  the  LL- 
BlockHash  or  LLRAMHash  to  determine  the  readback 
bitstream  location(s)  corresponding  to  that  physical  ele¬ 
ment  on  the  FPGA.  For  CLB  flip-flops  and  IOB  output 
flip-flops  (OUTFF s),  this  is  a  simple  lookup  into  the  LL- 
BlockHash.  For  a  INFF,  the  configuration  of  the  II  MUX 
and  I2MUX  has  to  be  tested  in  the  XDLBlockArray  en¬ 
try  to  determine  which  IOB  output  should  be  looked  up  in 
the  LLBlockHash.  For  RAMs,  the  LLRAMHash  is  used 
for  determining  the  readback  bitstream  locations;  since  all 
16  or  32  locations  are  recorded  in  a  single  entry,  only  one 
lookup  is  required.  At  the  end  of  the  XDLBlockArray 
traversal,  each  IOB  or  CLB  entry  of  the  array  contains  the 
bitstream  locations  for  the  state  of  their  flip-flops  or  RAMs. 

Next,  the  XDLBlockArray  is  again  traversed,  but  this 
time  XC 4 KTo  JHDLSyms  associates  each  RAM  or  flip- 
flop  in  each  entry  with  a  logical  JHDL  circuit  element  from 
the  .  rbsym  file.  When  the  .  rbsym  file  is  read  in,  a  hash 
table  of  logical  JHDL  readback  symbols,  which  we  will 
call  the  RBSymHash,  is  created,  each  entry  being  keyed  on 
the  instance  name  of  the  JHDL  design  elements.  For  each 
flip-flop  or  LUT  RAM  contained  in  an  XDLBlockArray 
entry,  the  corresponding  logical  JHDL  symbol  is  located 
in  the  RBSymHash  based  on  the  instance  name  recorded 
in  the  .xdl  file.  When  found,  the  bitstream  location  cor¬ 
responding  to  each  flip-flop  is  then  recorded  in  the  RB¬ 
SymHash  entry.  For  LUT  RAMs,  the  physical  permuta¬ 
tion  of  the  address  bits  is  recorded  in  the  RBSymHash  en¬ 
tries  in  addition  to  the  readback  bitstream  locations.  Since 
the  logical  net  names  recorded  in  the  RBSymHash  entries 
and  the  physical  net  names  recorded  in  XDLBlockAr¬ 
ray  entries  are  often  different  due  to  circuit  optimiza¬ 
tions,  the  MergedSignalsHash  is  used  to  check  for 


net-name  equivalences  between  these  two  representations 
when  recording  address  bit  permutations.  Once  the  XDL- 
BlockArray  has  been  traversed,  all  flip-flops  and  LUT 
RAMs  which  exist  in  the  physical  implementation  of  the 
JHDL  design  have  been  associated  with  their  logical  JHDL 
counterpart. 

As  the  last  process  before  writing  out  the  .  rbentry 
file,  XC4KTo  JHDLSyms  permutes  the  readback  bitstream 
locations  for  RAM  bits  according  to  the  address  permu¬ 
tations  recorded  in  the  RAM  entries  of  the  RBSymHash. 
When  computing  the  permutations  for  XC4000  family 
parts,  we  had  to  keep  in  mind  several  facts  about  the  ad¬ 
dressing  of  LUT  RAMs.  First,  for  32x1  LUT  RAMs,  the 
input  into  the  H1MUX  is  the  most  significant  address  bit — 
it  is  not  permuted  with  any  other  address  lines  by  the  cur¬ 
rent  Xilinx  FPGA  tools.  Further,  the  permutations  for  the 
address  pins  of  the  F  and  G  LUTs  are  identical  both  in  32x1 
RAMs  and  dual-ported  16x1  RAMs.  Additionally,  the  first 
16  physical  LUT  RAM  addresses  in  a  32x1  RAM  are  in 
the  F  LUT  while  the  next  16  are  in  the  G  LUT.  Lastly,  FI 
and  G1  are  the  MSBs  of  physical  LUT  RAMs’  addresses 
when  performing  the  permutations.  Though  not  taken  ad¬ 
vantage  of  in  our  tools,  we  also  discovered  that  the  ad¬ 
dresses  for  dual-ported  16x1  LUT  RAMs  (raml6xld)  are 
not  permuted  at  all  by  the  Xilinx  FPGA  tools. 

With  the  RAM  address  permutations  handled,  the  RB¬ 
SymHash  entries  are  written  out  in  an  .  rbentry  file  for 
use  by  the  JHDL  hardware  execution  environment.  For 
each  entry,  the  type  of  the  symbol  (flip-flop  or  RAM),  the 
symbol’s  JHDL  instance  name,  and  the  readback  bitstream 
location  of  the  symbol’s  state  is  recorded  in  the  file.  If  a 
certain  logical  JHDL  flip-flop  or  RAM  has  been  optimized 
away  (i.e.,  no  bitstream  locations  were  associated  with  the 
RBSymHash  entry),  the  entry  is  still  written  out,  but  an 
indication  is  made  that  it  was  optimized  away  by  the  Xil¬ 
inx  software.  At  this  point,  the  .rbentry  file  we  have 
created  provides  a  complete  mapping  of  every  flip-flop  and 
LUT  RAM  found  in  the  design’s  physical  implementation 
to  a  corresponding  design  element  in  the  logical  JHDL  cir¬ 
cuit  description. 

6  Hardware  Execution  and  Readback 

With  the  ability  to  generate  complete  readback  symbol 
tables,  we  created  a  CCM  platform-independent,  device¬ 
specific  readback  API  for  JHDL.  Because  of  its  platform 
independence,  the  API  can  easily  be  integrated  into  JHDL 
board  models  and  other  tools.  The  key  feature  which 
makes  this  possible  is  that  the  API  directly  manipulates 
“raw”  FPGA  readback  bitstreams,  which  are  only  depen¬ 
dent  on  the  FPGA  type  and  not  the  platform.  Though  most 
FPGA-based  platforms  have  their  own  APIs  for  handling 


readback,  many  also  provide  access  to  the  “raw”  readback 
bitstreams,  making  our  readback  mechanism  possible. 

Figure  5  illustrates  how  the  JHDL  readback  API  fits  in 
the  JHDL  hardware  execution  environment.  The  API  in¬ 
teracts  directly  with  a  JHDL  board  model  (or  other  tool)  as 
well  as  a  Java  Native  Interface  to  the  native  board  API — an 
API  often  written  in  either  C  or  C++. 


Figure  5:  JHDL  Hardware  Environment  Interactions 

There  are  three  aspects  with  integrating  the  readback 
functionality  into  a  board  model  or  other  design  tool:  the 
Java  Native  Interface  (JNI)  code  for  the  board,  the  read- 
back  management  code,  and  an  API  call  for  generating  the 
.  rbsym  file  for  a  design.  We  will  discuss  each  of  these 
aspects  in  turn. 

The  JNI  code  for  interfacing  JHDL  with  the  board’s  na¬ 
tive  API  must  implement  two  main  functions: 

int  nativeReadBackPE(int  peNum,  byte  []  bitstream) 

This  function  loads  the  readback  bitstream  into  the 
bitstream  byte  array  for  the  processing  element 
(PE)  specified  by  peNum.  An  integer  indicating 
failure  or  success  is  returned. 

int  nativePEType(int  peNum,  ReadBackPEType  peType) 

This  function  loads  information  about  the  PE’s  type 
into  the  peType  object  for  the  PE  specified  by 
peNum.  The  information  provided  includes  the 
FPGA  type  (XC4062XL,  XC4036EX,  etc.);  the 
length  of  the  bitstream’s  header,  i.e.,  the  number  of 
bits  before  the  first  start  bit  of  the  first  configuration 
frame;  and,  an  indication  of  whether  the  MSBs  of 
the  bytes  in  the  array  have  the  lowest  or  the  highest 
bitstream  offset  number  within  the  byte.  An  integer 
indicating  failure  or  success  is  returned. 

These  are  the  only  board-specific  methods  which  the  read- 
back  API  requires  of  the  board  JNI  code  for  readback  op¬ 
eration.  As  Figure  5  suggests,  the  readback  API  calls  these 
JNI  functions  directly. 


The  JHDL  readback  API  provides  the  ReadBack- 
Manager  class  for  managing  readback  data.  The  class 
provides  methods  for: 

•  adding  and  removing  readback  symbols  for  a  specific 
PE  on  the  board, 

•  enabling  and  disabling  readback  for  a  specific  PE, 

•  actually  performing  readback  on  the  board’s  PEs, 

•  and  creating  data  structures  of  the  readback  values  so 
they  can  be  loaded  into  the  IHDL  simulator. 

Those  who  write  board  models  or  other  IHDL-based  tools 
do  not  have  to  worry  about  manipulating  bitstreams  or 
making  the  associations  between  the  IHDL  simulation 
models  and  the  readback  state  of  the  hardware;  this  is 
all  performed  by  the  ReadBackManager,  its  supporting 
classes,  and  XC4KTo  JHDLSyms. 

To  create  the  .  rbsym  netlist  of  IHDL  circuit  elements 
used  by  XC4KTo JHDLSyms,  the  IHDL  readback  API 
provides  the  ReadBackSymbolWriter  class.  When  a 
board  model  creates  an  EDIF  netlist  for  a  circuit,  it  should 
also  instance  a  ReadBackSymbolWriter  object  and 
call  the  writeRBSymlnf  o  method  for  the  same  circuit 
to  create  a  .rbsym  file.  Nothing  additional  is  required  to 
create  these  netlists. 

Currently,  we  have  been  using  this  readback  API  in  our 
models  for  the  Annapolis  Micro  Systems  Wildforce  fam¬ 
ily  of  CCM  boards  as  well  as  ISI's  SLAAC1  family  of 
boards.  From  our  experience,  the  IHDL  readback  API 
performs  readback  with  little  additional  performance  over¬ 
head  when  compared  with  using  just  the  readback  APIs 
provided  with  the  boards.  Additionally,  the  memory  over¬ 
head  for  IHDL’s  readback  API  has  been  on  the  order  of 
only  a  few  megabytes  for  designs  with  tens  of  thousands 
of  readback  values.  The  symbolic  readback  API  cur¬ 
rently  provided  with  the  Wildforce  boards  suffers  from  a 
large  amount  of  memory  overhead,  requiring  as  much  as 
250  MB  of  RAM  for  handling  an  equivalent  number  of 
values — the  API  was  clearly  not  intended  for  large  num¬ 
bers  of  readback  symbols. 

7  Improving  FPGA  Implementation  Tools’ 
Support  for  Determining  Logical-to- 
Physical  Mappings 

Having  wrestled  with  the  task  of  determining  logical-to- 
physical  mappings  for  FPGA  designs,  we  have  several  sug¬ 
gestions  that  FPGA  vendors  can  follow  which  can  greatly 
ease  this  process: 


1 .  Instance  names  for  nets  as  well  as  flip-flops,  RAMs, 
and  other  state  elements  should  be  preserved  as  long 
as  possible  during  the  design  implementation  pro¬ 
cess  ( netlist  conversion,  technology  mapping,  design 
placement  and  routing,  etc.).  The  fact  that  the  XDL 
representation  of  the  placed-and-routed  circuit  still  re¬ 
flected  the  instance  names  of  these  design  elements 
made  their  correspondence  to  the  IHDL  design  almost 
ttivial. 

2.  If  the  design  is  modified  in  any  way,  report  it.  When 
Uying  to  support  the  readback  of  LUT  RAMs  on 
XC4000  parts,  we  discovered  that  when  net  names  are 
flattened  across  a  design’s  hierarchical  boundaries  by 
the  Xilinx  tools,  these  changes  were  not  reported  any¬ 
where,  making  it  hard  to  track  how  net  names  were 
changed  in  the  design.  As  a  result,  this  made  it  es¬ 
pecially  hard  to  determine  the  ordering  of  the  address 
pins  on  LUT  RAMs  since  we  had  no  simple  way  of 
relating  the  names  of  the  nets  in  the  physical  imple¬ 
mentation  with  those  of  the  IHDL  design  when  op¬ 
timizations  and  net  name  flattening  were  performed 
by  the  Xilinx  tools.  Our  work-around  for  this  prob¬ 
lem  is  to  place  buffers  just  before  the  address  pins  of 
the  LUT  RAMs  to  prevent  the  net  names  from  being 
completely  flattened.  Xilinx’s  map  tool  will  optimize 
away  the  buffers  and  change  the  net  names  for  the  ad¬ 
dress  signals,  but  these  changes  are  reported ,  making 
it  easier  to  associate  the  physical  net  names  attached 
to  the  LUTs’  address  pins  to  nets  in  the  IHDL  design. 1 

3.  When  the  state  of  RAMs  can  be  sampled,  the  FPGA 
vendor  tools  should  provide  some  method  of  either 
constraining  the  ordering  of  the  address  pins  or  re¬ 
porting  how  the  address  pins  were  permuted.  This 
addition  alone  would  have  greatly  simplified  the  gen¬ 
eration  of  readback  symbol  tables  since  the  net  names 
of  the  address  signals  would  no  longer  be  needed.  The 
second  option  is  probably  preferred  since  it  allows  the 
place  and  route  tools  to  better  optimize  the  physical 
design  for  speed. 

4.  Make  report  files  so  that  they  can  be  easily  parsed 
by  computer  software.  Making  report  files  available 
in  human-readable  format  is  important,  but  providing 
some  command  option  for  making  the  reports  reason¬ 
able  to  parse  using  a  grammar  would  ease  the  process 
of  determining  logical-to-physical  design  mappings. 
For  instance,  the  .  mrp  file  breaks  lines  at  a  set  loca¬ 
tion  to  make  the  files  more  readable  in  a  text  editor 
or  other  viewer,  but  the  line  breaks  make  the  file  very 

1  We  consider  this  approach  to  be  a  bit  of  a  “hack”. 


difficult  to  describe  with  a  grammar  for  compiler  tools 
such  as  JavaCC  or  lex  and  yacc. 

5.  Vendors  should  provide  designers  with  the  ability  to 
fully  determine  how  the  FPGA  was  configured  using 
an  open  file  format.  We  found  the  XDL  representa¬ 
tions  of  XC4000-based  designs  to  be  quite  important 
for  creating  symbol  tables.  Without  this  information, 
it  would  have  been  hard  to  determine  which  IOB  out¬ 
put  was  used  for  the  output  of  the  INFF,  which  net 
was  attached  to  the  MSB  of  a  32x1  LUT  RAM’s  ad¬ 
dress  pins,  or  how  the  addresses  of  LUT  RAMs  were 
permuted.  This  degree  of  visibility  into  the  physi¬ 
cal  design  was  crucial  for  determining  the  logical-to- 
physical  mapping  of  the  design  and,  thus,  for  creating 
a  complete  and  accurate  readback  symbol  table. 


8  Other  Uses  of  Logical-to-Physical  Map¬ 
pings 

Having  gone  through  the  exercise  of  determining  how 
logical  designs  are  mapped  to  physical  FPGA  circuits,  we 
have  come  to  realize  how  useful  this  information  really  is. 
With  the  information,  the  design  environment  can  help  the 
designer  better  understand  how  designs  were  optimized  as 
well  as  the  characteristics  of  their  FPGA  implementations. 
For  instance,  beyond  the  ability  to  support  hardware  de¬ 
bugging  in  a  simulation-like  environment,  this  information 
can  be  used  to  help  estimate  the  power  consumption  of  cir¬ 
cuits  from  within  the  simulation  environment — we  know 
which  nets  exist  in  the  physical  circuit  and  can  monitor 
their  toggle  rates  in  the  simulator.  Further,  we  might  be 
able  to  extract  capacitance  or  path  length  estimates  from 
the  physical  design  to  provide  more  accurate  power  mod¬ 
els. 

Understanding  these  mappings  also  has  the  potential 
to  allow  quick  modifications  of  designs  without  having 
to  go  through  the  complete  ASIC-style  design  cycle  over 
and  over  again.  As  an  example,  several  designs  we  have 
worked  on  at  BYU  required  only  the  modification  of  ROM 
contents  to  change  the  circuit’s  operational  parameters. 
Through  the  direct  modification  of  bitstreams  via  JBits  [8] 
or  an  XDL  representation  of  the  circuit,  these  design  mod¬ 
ifications  would  take  only  minutes  not  hours  to  complete, 
as  can  happen  when  the  entire  Xilinx  FPGA  design  flow  is 
used.  For  that  matter,  the  run-time  reconfiguration  or  mod¬ 
ification  of  these  same  circuits  becomes  easier  to  perform 
and  manage  with  this  information. 


9  Summary  and  Future  Work 

We  have  discussed  the  utility  of  knowing  the  logical-to- 
physical  mappings  of  FPGA  circuit  designs  in  the  context 
of  hardware  debugging  and  readback.  As  an  illustration  of 
their  utility,  we  have  discussed  a  methodology  for  provid¬ 
ing  platform-independent,  device-specific  support  for  read- 
back  in  JHDL  board  models  and  execution  environments. 
This  API  has  been  successfully  used  to  support  board  mod¬ 
els  for  both  the  Wildforce  and  SLAAC1  families  of  CCMs, 
providing  good  performance  with  little  memory  overhead. 
Since  the  readback  software  for  JHDL  can  completely  de¬ 
termine  the  logical-to-physical  mappings  of  JHDL  designs, 
it  can  provide  developers  of  JHDL  board  environments 
with  a  simple  way  of  adding  readback  capabilities  with¬ 
out  large  amounts  of  additional  coding.  The  JHDL  read- 
back  API  depends  on  the  ability  of  CCM  board  APIs  to 
provide  “raw”  readback  bitstreams  from  FPGA  processing 
elements  to  make  platform  independence  possible. 

In  the  course  of  the  paper,  we  have  also  discussed  in 
detail  how  to  determine  the  logical-to-physical  mappings 
of  all  flip-flops  and  LUT  RAMs  in  optimized,  placed-and- 
routed  XC4000  designs.  Several  key  factors  made  this 
possible,  including  the  preservation  of  designs’  instance 
names  by  the  Xilinx  FPGA  tools  and  the  ability  to  view 
the  complete  configuration  of  the  FPGA  using  XDL.  The 
knowledge  of  how  JHDL  designs  are  mapped  to  Xilinx  de¬ 
sign  primitives  at  netlist  time  also  simplified  the  process. 

In  the  near  future,  we  plan  to  support  these  same  logical- 
to-physical  mapping  capabilities  for  the  Xilinx  Virtex  fam¬ 
ily  of  FPGAs  in  the  JHDL  readback  API  as  well  as  provide 
JHDL  designers  with  feedback  concerning  how  their  de¬ 
signs  were  optimized  by  the  Xilinx  tools.  We  also  plan 
to  explore  other  possible  applications  of  these  mappings, 
including  modeling  power  in  FPGAs,  quick  design  modifi¬ 
cations,  and  controlling  low-level  design  implementation. 
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